Simple Example

Output

seq1 <- ">random sequence 1 consisting of 20 residues.
KMMIDIHWGMWWYEYMMCLD"

seq2 <- ">random sequence 1 consisting of 20 residues.
DVYRVCQNVFRYHHFCKRTI"

out <- statSignf(seq1, seq2)
The names of the sequences are 'random sequence 1 consisting of 20 residues' and 
'random sequence 1 consisting of 20 residues'. 
These sequences are of type 'Protein'. 
The type of alignment that has been done here is 'global'. 
The substitution matrix used is 'BLOSUM62'. 
The gap scores are -3 for open penalty and -1 for extended penalty. 
The number of shuffles done are 1000. 
The shuffling was done on the first sequence mentioned above. 
The original score is 1. 
The estimations of the parameters of the Gumbel distribution are: 
 Scale parameter lambda = 0.2393786 and mode u = -1.859929. 
The p-value estimated using the estimated Gumbel distribution is 0.3960651. 
The p-value estimated empirically by counting in the histogram is 0.411. 
The estimated K is 0.001601696. 
The standardized score is S' = 0.684606. 
Summary of the 1000 scores calculated after shuffling: 
Min.     1st Qu.  Median   Mean     3rd Qu.  Max     
 -15.000   -3.000    0.000    0.551    4.000   20.000

Plot

out()

Local Alignment

Output

seq1 <- ">random sequence 1 consisting of 20 residues.
KMMIDIHWGMWWYEYMMCLD"

seq2 <- ">random sequence 1 consisting of 20 residues.
DVYRVCQNVFRYHHFCKRTI"

out <- statSignf(seq1, seq2, num.shuffles = 3000,
      subst.matrix = "BLOSUM45", kind.align = "local")
The names of the sequences are 'random sequence 1 consisting of 20 residues' and 
'random sequence 1 consisting of 20 residues'. 
These sequences are of type 'Protein'. 
The type of alignment that has been done here is 'local'. 
The substitution matrix used is 'BLOSUM45'. 
The gap scores are -3 for open penalty and -1 for extended penalty. 
The number of shuffles done are 3000. 
The shuffling was done on the first sequence mentioned above. 
The original score is 22. 
The estimations of the parameters of the Gumbel distribution are: 
 Scale parameter lambda = 0.287885 and mode u = 20.82963. 
The p-value estimated using the estimated Gumbel distribution is 0.5102972. 
The p-value estimated empirically by counting in the histogram is 0.4846667. 
The estimated K is 1.005085. 
The standardized score is S' = 0.336933. 
Summary of the 3000 scores calculated after shuffling: 
Min.     1st Qu.  Median   Mean     3rd Qu.  Max     
13.00000 20.00000 22.00000 22.83433 26.00000 42.00000

Plot

out()

Non-random Sequences

Output

seq1 <- ">non-random sequence 1 very similar to 2.
KKKKKKKKKKRRRRNNRRLLLMMMNM"

seq2 <- ">non-random sequence 2 very similar to 1.
KKKKKKKKKKKRRRRRRLLLLMMMNN"

out <- statSignf(seq1, seq2, num.shuffles = 2000,
      subst.matrix = "BLOSUM100", shuffle.seq = 2, gap.scores = c(8, 2))
The names of the sequences are 'non-random sequence 1 very similar to 2' and 
'non-random sequence 2 very similar to 1'. 
These sequences are of type 'Protein'. 
The type of alignment that has been done here is 'global'. 
The substitution matrix used is 'BLOSUM100'. 
The gap scores are -8 for open penalty and -2 for extended penalty. 
The number of shuffles done are 2000. 
The shuffling was done on the second sequence mentioned above. 
The original score is 199. 
The estimations of the parameters of the Gumbel distribution are: 
 Scale parameter lambda = 0.06495775 and mode u = 51.15488. 
The p-value estimated using the estimated Gumbel distribution is 6.747724e-05. 
The p-value estimated empirically by counting in the histogram is 0. 
The estimated K is 0.04103675. 
The standardized score is S' = 9.603686. 
Summary of the 2000 scores calculated after shuffling: 
Min.     1st Qu.  Median   Mean     3rd Qu.  Max     
  7.0000  46.0000  59.0000  60.0395  73.0000 139.0000

Plot

out()

Longer Sequences

Row

Sequence 1

seq1 <- ">random sequence 1 consisting of 500 residues.
YWWCSKEWDHFPDVTTCTPSEQYPAWAMHNMILPNWQLAEMMQSYRYIPNIALPNHQLTK
AHSMWAQHSMYVQRCCETKDLLFNDDTSQDKAYPFKQMESNHHITNEFPNKCKILTSVKH
ATLWNLQALGCCKDPCRSNKFHKKLNIDIHCSPGWTGWNAQYSSPGPFCWEPKTVNYSWF
RFKWHYFPCVHNIGSSRQCVWLRYFHLSSEQWKQGARGWVVVIFGACSGWYPWDNGQLYQ
EKNIFCAGKGQCATDQYFNYLWSWMISAGWAVYPYDECVTRTVISLFEVAYYFRHPYMWH
NITIMLRNETLPAVTQCVLETLHTHYCLALWLEEMYPCTDEGYNRKTPGDTHVQDCAFFS
ESHKHDVKTNWVGSDSINYNPGSVMWKICDHLGAGMYGRPTPADWSQSIITNHICCGDAS
HCGEQWCAVNNDSVSTMISQFQTSEWAHPIVINQHGIEPDMSWGELARVLTQNPGLGQQN
TIRMKTFYRKFFPCMYFNFQ"

Sequence 2

seq2 <- ">random sequence 1 consisting of 500 residues.
FWMLVNRLCQDTGWEYADCRPDHGNESRMMYKDCFYHITDPAEATVFPIESFCQHLCSNF
WSNTHWRIINYPPLHWKNYCGSAFWGRNYGEWSCFEDRMPFATEIETHSNPVFNDFQEAQ
CNNQARKKNGWKVSAPQEAMPQGMMRLTWIFIHEMWGWFSWVWRLIQNQINEPGVKPLEL
CSETQHGVFGWRDIAIMIRMEYKCDVLFMWLILCPCYYSDQFCKRIAAHQEAMRWKIAKT
AWQFVGKIKGPKCRMLTERRQFACEVTEQACKRCLPHAVRHTGQSHKYHCTAFRPFTKIS
VAYGVEIGESFFFQWYWQFRFSFATAWAISNWGQGWSLCIEEVNWNDIKEHFTFTTKIML
VFTENFTEHSQEIFDSLMDHGPDHKVSIIARQMVACQIWITHKMGCCHDGLLYTPTHDWF
LQVRVPCFFQWVEGWNGIEDDIPGHVRMNSMTSWLNKPLASRILLDIYIMTWDNRWFKWD
KTWMVVGHNIHDAWSFENVK"

Row

Plot

out()

DNA

Row

Sequence 1

seq1 <- ">random sequence 1 consisting of 800 bases.
ctattggccgttggaggggaagtgtatcgactcgtcaagcgctctaatgagaaaccggct
tactcgccctcttatgatccttgacaagcgtctgactccgcaaggatcccgggctgaggg
ctacacccacctcgtttttagttttttaacagcagaaatcagcccgacgaagaacgagga
aggttctattaagagccgacggatccgaagaacacgtggtccctgacccttagaaccgat
tcgcgggctaaaaagcaacacgcttcgacccgatctgtcgcattgaattgaaacgcccct
ttattgtagctactaaccaggcttccttatagctattagcagggctggaagtgcggattc
actctaagacggtcacagctcatcagctcccctgcctgtagctgaacgtaccaatatctg
caagtaacaaccttgagtaccgtaagactgtaaaaatcttcgattccgcattatagggca
aggctaacgcatttccgccaaaggagaatgaggttgttgcattcccgtgagtgaaaggtc
ttagaaaacaagttcgatgcgcaggaggttcagattctgttccccccgtcgcgccgcgcc
tcaaaacccggtaccttcgcaaggagcccccagactctatggctaatcacttgtttgcaa
gaggctctgatccaaatgacatcaacctaacgcacaaagctgaccgaatataaacgattt
tctagcatatcgcgttttaattcgctagctgcagctttcgcatccgtcgccgctttaggc
ggggcgaacatcctgacccg"

Sequence 2

seq2 <- ">random sequence 1 consisting of 800 bases.
gagtgttttcattagcatccgcttagcatgggggtatgctgtactagtagacgtcagaga
taaggactggtcataattcaaggcgccgccatactaatgttagccttgttaatgactaat
aaatcctccccggaaatccataacaacgaagcctcgcgattacccgcaacgagaaaaact
caataaagggaaggcagacaaatgtatgcagatagtgcatgttctgcgaacactccacgt
atttgaacatataatcctaaaggtgtgttacccgagacactccgtccactctcgtcggta
acataagggtcgtcttttagtaacttccagcggtcatactaacaggctcatcgcactggt
agattctgtgcgggtaaacgcattatagacttgtgtcgagcccccggtgctggcgaattg
tccgcaggggcgacaggagtcaaaacgtgcttgaacaagaccgaacctatgtccttctga
gacgggatctccaatcgccccactgcggagcccctggcaccgcttatgccctagcagtca
ctgtacaaaacctatagccgtgccaccactcagcccgtctacggctgcaccgctactcaa
gtcccgagactcacgctgctcgaactgccttactctggtcaatagagcttaggaacctgt
gtaacggcatgcgttctgtgttagttaaatcgtgacatggcgcccaagcacagtcataat
tattgaaagttatctacatgggatgatgagcactgagggacgcaccgcgcctttatccat
ctatgccgcggcccgttcca"

Row

Plot

out()

Suppress Output and Plotting

Output

seq1 <- ">random sequence 1 consisting of 20 residues.
KMMIDIHWGMWWYEYMMCLD"

seq2 <- ">random sequence 1 consisting of 20 residues.
DVYRVCQNVFRYHHFCKRTI"

out <- statSignf(seq1, seq2,
       plot = F, suppress.output = T)

Plot

# out() gives an error, since 'out' is no longer a function. 
out
NULL
---
title: "Statistical Significance of a Pairwise Alignment of Sequences"
author: "Alex-Alex-Helena"
date: "10.11.2021"
output: 
  flexdashboard::flex_dashboard:
    orientation: rows
    social: menu
    source_code: embed
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(StatSignfPairSeqAlign)
```

Simple Example
====================================================================================================================================

### Output {data-width=450}

```{r}
seq1 <- ">random sequence 1 consisting of 20 residues.
KMMIDIHWGMWWYEYMMCLD"

seq2 <- ">random sequence 1 consisting of 20 residues.
DVYRVCQNVFRYHHFCKRTI"

out <- statSignf(seq1, seq2)
```

### Plot {data-width=550}

```{r}
out()
```

Local Alignment
====================================================================================================================================

### Output {data-width=450}

```{r}
seq1 <- ">random sequence 1 consisting of 20 residues.
KMMIDIHWGMWWYEYMMCLD"

seq2 <- ">random sequence 1 consisting of 20 residues.
DVYRVCQNVFRYHHFCKRTI"

out <- statSignf(seq1, seq2, num.shuffles = 3000,
      subst.matrix = "BLOSUM45", kind.align = "local")
```

### Plot {data-width=550}

```{r}
out()
```

Non-random Sequences
====================================================================================================================================

### Output {data-width=450}

```{r}
seq1 <- ">non-random sequence 1 very similar to 2.
KKKKKKKKKKRRRRNNRRLLLMMMNM"

seq2 <- ">non-random sequence 2 very similar to 1.
KKKKKKKKKKKRRRRRRLLLLMMMNN"

out <- statSignf(seq1, seq2, num.shuffles = 2000,
      subst.matrix = "BLOSUM100", shuffle.seq = 2, gap.scores = c(8, 2))
```

### Plot {data-width=550}

```{r}
out()
```


Longer Sequences
====================================================================================================================================

Column {.sidebar data-width=700}
-------------------------------------
### Output
```{r}
out <- statSignf(seq1, seq2,
      subst.matrix = "PAM250", kind.align = "local")
```

Row {data-height=310}
-------------------------------------
### Sequence 1
```{r}
seq1 <- ">random sequence 1 consisting of 500 residues.
YWWCSKEWDHFPDVTTCTPSEQYPAWAMHNMILPNWQLAEMMQSYRYIPNIALPNHQLTK
AHSMWAQHSMYVQRCCETKDLLFNDDTSQDKAYPFKQMESNHHITNEFPNKCKILTSVKH
ATLWNLQALGCCKDPCRSNKFHKKLNIDIHCSPGWTGWNAQYSSPGPFCWEPKTVNYSWF
RFKWHYFPCVHNIGSSRQCVWLRYFHLSSEQWKQGARGWVVVIFGACSGWYPWDNGQLYQ
EKNIFCAGKGQCATDQYFNYLWSWMISAGWAVYPYDECVTRTVISLFEVAYYFRHPYMWH
NITIMLRNETLPAVTQCVLETLHTHYCLALWLEEMYPCTDEGYNRKTPGDTHVQDCAFFS
ESHKHDVKTNWVGSDSINYNPGSVMWKICDHLGAGMYGRPTPADWSQSIITNHICCGDAS
HCGEQWCAVNNDSVSTMISQFQTSEWAHPIVINQHGIEPDMSWGELARVLTQNPGLGQQN
TIRMKTFYRKFFPCMYFNFQ"
```

### Sequence 2
```{r}
seq2 <- ">random sequence 1 consisting of 500 residues.
FWMLVNRLCQDTGWEYADCRPDHGNESRMMYKDCFYHITDPAEATVFPIESFCQHLCSNF
WSNTHWRIINYPPLHWKNYCGSAFWGRNYGEWSCFEDRMPFATEIETHSNPVFNDFQEAQ
CNNQARKKNGWKVSAPQEAMPQGMMRLTWIFIHEMWGWFSWVWRLIQNQINEPGVKPLEL
CSETQHGVFGWRDIAIMIRMEYKCDVLFMWLILCPCYYSDQFCKRIAAHQEAMRWKIAKT
AWQFVGKIKGPKCRMLTERRQFACEVTEQACKRCLPHAVRHTGQSHKYHCTAFRPFTKIS
VAYGVEIGESFFFQWYWQFRFSFATAWAISNWGQGWSLCIEEVNWNDIKEHFTFTTKIML
VFTENFTEHSQEIFDSLMDHGPDHKVSIIARQMVACQIWITHKMGCCHDGLLYTPTHDWF
LQVRVPCFFQWVEGWNGIEDDIPGHVRMNSMTSWLNKPLASRILLDIYIMTWDNRWFKWD
KTWMVVGHNIHDAWSFENVK"
```   

Row {data-height=690}
-------------------------------------
### Plot
```{r}
out()
```

DNA
====================================================================================================================================

Column {.sidebar data-width=650}
-------------------------------------
### Output
```{r}
out <- statSignf(seq1, seq2, kind.seq = "DNA", gap.scores = c(8, 2), 
                 num.shuffles = 200)
```

Row {data-height=450}
-------------------------------------
### Sequence 1
```{r}
seq1 <- ">random sequence 1 consisting of 800 bases.
ctattggccgttggaggggaagtgtatcgactcgtcaagcgctctaatgagaaaccggct
tactcgccctcttatgatccttgacaagcgtctgactccgcaaggatcccgggctgaggg
ctacacccacctcgtttttagttttttaacagcagaaatcagcccgacgaagaacgagga
aggttctattaagagccgacggatccgaagaacacgtggtccctgacccttagaaccgat
tcgcgggctaaaaagcaacacgcttcgacccgatctgtcgcattgaattgaaacgcccct
ttattgtagctactaaccaggcttccttatagctattagcagggctggaagtgcggattc
actctaagacggtcacagctcatcagctcccctgcctgtagctgaacgtaccaatatctg
caagtaacaaccttgagtaccgtaagactgtaaaaatcttcgattccgcattatagggca
aggctaacgcatttccgccaaaggagaatgaggttgttgcattcccgtgagtgaaaggtc
ttagaaaacaagttcgatgcgcaggaggttcagattctgttccccccgtcgcgccgcgcc
tcaaaacccggtaccttcgcaaggagcccccagactctatggctaatcacttgtttgcaa
gaggctctgatccaaatgacatcaacctaacgcacaaagctgaccgaatataaacgattt
tctagcatatcgcgttttaattcgctagctgcagctttcgcatccgtcgccgctttaggc
ggggcgaacatcctgacccg"
```

### Sequence 2
```{r}
seq2 <- ">random sequence 1 consisting of 800 bases.
gagtgttttcattagcatccgcttagcatgggggtatgctgtactagtagacgtcagaga
taaggactggtcataattcaaggcgccgccatactaatgttagccttgttaatgactaat
aaatcctccccggaaatccataacaacgaagcctcgcgattacccgcaacgagaaaaact
caataaagggaaggcagacaaatgtatgcagatagtgcatgttctgcgaacactccacgt
atttgaacatataatcctaaaggtgtgttacccgagacactccgtccactctcgtcggta
acataagggtcgtcttttagtaacttccagcggtcatactaacaggctcatcgcactggt
agattctgtgcgggtaaacgcattatagacttgtgtcgagcccccggtgctggcgaattg
tccgcaggggcgacaggagtcaaaacgtgcttgaacaagaccgaacctatgtccttctga
gacgggatctccaatcgccccactgcggagcccctggcaccgcttatgccctagcagtca
ctgtacaaaacctatagccgtgccaccactcagcccgtctacggctgcaccgctactcaa
gtcccgagactcacgctgctcgaactgccttactctggtcaatagagcttaggaacctgt
gtaacggcatgcgttctgtgttagttaaatcgtgacatggcgcccaagcacagtcataat
tattgaaagttatctacatgggatgatgagcactgagggacgcaccgcgcctttatccat
ctatgccgcggcccgttcca"
```   

Row {data-height=550}
-------------------------------------
### Plot
```{r}
out()
```

Suppress Output and Plotting
====================================================================================================================================

### Output {data-width=500}

```{r}
seq1 <- ">random sequence 1 consisting of 20 residues.
KMMIDIHWGMWWYEYMMCLD"

seq2 <- ">random sequence 1 consisting of 20 residues.
DVYRVCQNVFRYHHFCKRTI"

out <- statSignf(seq1, seq2,
       plot = F, suppress.output = T)
```

### Plot {data-width=500}

```{r}
# out() gives an error, since 'out' is no longer a function. 
out
```